34 research outputs found
Model Learning for Look-ahead Exploration in Continuous Control
We propose an exploration method that incorporates look-ahead search over
basic learnt skills and their dynamics, and use it for reinforcement learning
(RL) of manipulation policies . Our skills are multi-goal policies learned in
isolation in simpler environments using existing multigoal RL formulations,
analogous to options or macroactions. Coarse skill dynamics, i.e., the state
transition caused by a (complete) skill execution, are learnt and are unrolled
forward during lookahead search. Policy search benefits from temporal
abstraction during exploration, though itself operates over low-level primitive
actions, and thus the resulting policies does not suffer from suboptimality and
inflexibility caused by coarse skill chaining. We show that the proposed
exploration strategy results in effective learning of complex manipulation
policies faster than current state-of-the-art RL methods, and converges to
better policies than methods that use options or parametrized skills as
building blocks of the policy itself, as opposed to guiding exploration. We
show that the proposed exploration strategy results in effective learning of
complex manipulation policies faster than current state-of-the-art RL methods,
and converges to better policies than methods that use options or parameterized
skills as building blocks of the policy itself, as opposed to guiding
exploration.Comment: This is a pre-print of our paper which is accepted in AAAI 201
Particle Videos Revisited: Tracking Through Occlusions Using Point Trajectories
Tracking pixels in videos is typically studied as an optical flow estimation
problem, where every pixel is described with a displacement vector that locates
it in the next frame. Even though wider temporal context is freely available,
prior efforts to take this into account have yielded only small gains over
2-frame methods. In this paper, we revisit Sand and Teller's "particle video"
approach, and study pixel tracking as a long-range motion estimation problem,
where every pixel is described with a trajectory that locates it in multiple
future frames. We re-build this classic approach using components that drive
the current state-of-the-art in flow and object tracking, such as dense cost
maps, iterative optimization, and learned appearance updates. We train our
models using long-range amodal point trajectories mined from existing optical
flow datasets that we synthetically augment with occlusions. We test our
approach in trajectory estimation benchmarks and in keypoint label propagation
tasks, and compare favorably against state-of-the-art optical flow and feature
tracking methods
Gen2Sim: Scaling up Robot Learning in Simulation with Generative Models
Generalist robot manipulators need to learn a wide variety of manipulation
skills across diverse environments. Current robot training pipelines rely on
humans to provide kinesthetic demonstrations or to program simulation
environments and to code up reward functions for reinforcement learning. Such
human involvement is an important bottleneck towards scaling up robot learning
across diverse tasks and environments. We propose Generation to Simulation
(Gen2Sim), a method for scaling up robot skill learning in simulation by
automating generation of 3D assets, task descriptions, task decompositions and
reward functions using large pre-trained generative models of language and
vision. We generate 3D assets for simulation by lifting open-world 2D
object-centric images to 3D using image diffusion models and querying LLMs to
determine plausible physics parameters. Given URDF files of generated and
human-developed assets, we chain-of-thought prompt LLMs to map these to
relevant task descriptions, temporal decompositions, and corresponding python
reward functions for reinforcement learning. We show Gen2Sim succeeds in
learning policies for diverse long horizon tasks, where reinforcement learning
with non temporally decomposed reward functions fails. Gen2Sim provides a
viable path for scaling up reinforcement learning for robot manipulators in
simulation, both by diversifying and expanding task and environment
development, and by facilitating the discovery of reinforcement-learned
behaviors through temporal task decomposition in RL. Our work contributes
hundreds of simulated assets, tasks and demonstrations, taking a step towards
fully autonomous robotic manipulation skill acquisition in simulation